Author

E.D. Gennatas

Published

February 27, 2026

rtemis Workshop Demo

rtemis Home | Documentation | GitHub

Load Packages

library(rtemis)
  .:rtemis 1.0.0 🌊 aarch64-apple-darwin20
library(data.table)

Read & Inspect Data

dat <- read("../Data/data.xlsx")
2026-02-26 20:26:34  Reading data.xlsx using readxl::read_excel()... [read]
2026-02-26 20:26:34 Read in 500 x 7 [read]
2026-02-26 20:26:34  Done in 0.11 seconds. [read]
inspect(dat)
<data.table> 500 x 7
              Lab: <chr> Lab E, Lab I, Lab E, Lab E...
         Organism: <chr> No Significant Growth, No growth, Normal flora, Candida spp....
              Sex: <chr> Male, Female, Female, Female...
       Department: <chr> Out Patient Department, Pediatric ward, Out Patient Department, Gynaecology ward...
             Year: <nmr> 2021.00, 2023.00, 2021.00, 2021.00...
     Specimentype: <chr> Urine, Blood, Stool, Cervical swab...
Hospitalized48hrs: <chr> No, No, No, No...

Check data

check_data(dat)
  dat: A data.table with 500 rows and 7 columns.

  Data types
  * 1 numeric feature
  * 0 integer features
  * 0 factors
  * 6 character features
  * 0 date features

  Issues
  * 0 constant features
  * 118 duplicate cases
  * 0 missing values

  Recommendations
  * Consider converting character features to factors or excluding them.
  * Consider removing the duplicate cases.

Preprocess data

Create a Preprocessor object:

prp <- preprocess(
  dat,
  config = setup_Preprocessor(character2factor = TRUE, remove_duplicates = TRUE)
)
2026-02-26 20:26:34 Removing 118 duplicate cases... [preprocess]
2026-02-26 20:26:34 Converting 6 character features to factors... [preprocess]
2026-02-26 20:26:34 Preprocessing done. [preprocess]

Get the preprocessed data:

datp <- preprocessed(prp)

Re-check data:

check_data(datp)
  datp: A data.table with 382 rows and 7 columns.

  Data types
  * 1 numeric feature
  * 0 integer features
  * 6 factors, of which 0 are ordered
  * 0 character features
  * 0 date features

  Issues
  * 0 constant features
  * 0 duplicate cases
  * 0 missing values

  Recommendations
  * Everything looks good

Train Models

We train 4 models using different algorithms, but the same outer resampling folds.
Please note that we are doing minimal tuning to reduce demo runtime.

GLMNET

hospitalized48_glmnet <- train(
  datp,
  algorithm = "glmnet",
  outer_resampling_config = setup_Resampler(seed = 650)
)
2026-02-26 20:26:35  [train]
2026-02-26 20:26:35 Training set: 382 cases x 6 features. [summarize_supervised]
2026-02-26 20:26:35 Tuning parallelization enabled. [get_n_workers]
2026-02-26 20:26:35 // Max workers: 7 => Algorithm: 1; Tuning: 7; Outer Resampling: 1 [get_n_workers]
2026-02-26 20:26:35 <> Training GLMNET Classification using 10 independent folds... [train]
2026-02-26 20:26:35 Input contains more than one column; stratifying on last. [resample]
2026-02-26 20:26:35 Using max n bins possible = 2. [kfold]
⠙ 1/10 ETA: 24s | Training outer resamples... 
⠸ 4/10 ETA: 14s | Training outer resamples... 
⠼ 5/10 ETA: 12s | Training outer resamples... 
⠴ 6/10 ETA:  9s | Training outer resamples... 
⠦ 8/10 ETA:  5s | Training outer resamples... 
⠧ 9/10 ETA:  2s | Training outer resamples... 
2026-02-26 20:26:58 </> Outer resampling done. [train]
<Resampled Classification Model>
GLMNET (Elastic Net)
⚙ Tuned using exhaustive grid search.
⟳ Tested using 10 independent folds.

  <Resampled Classification Training Metrics>
    Showing mean (sd) across resamples.
          Sensitivity: 0.906 (3e-03)
          Specificity: 0.629 (0.033)
    Balanced_Accuracy: 0.767 (0.017)
                  PPV: 0.716 (0.018)
                  NPV: 0.866 (0.008)
                   F1: 0.800 (0.012)
             Accuracy: 0.770 (0.017)
                  AUC: 0.850 (0.008)
          Brier_Score: 0.172 (0.006)

  <Resampled Classification Test Metrics>
    Showing mean (sd) across resamples.
          Sensitivity: 0.881 (0.061)
          Specificity: 0.580 (0.122)
    Balanced_Accuracy: 0.731 (0.075)
                  PPV: 0.688 (0.065)
                  NPV: 0.823 (0.102)
                   F1: 0.771 (0.056)
             Accuracy: 0.733 (0.075)
                  AUC: 0.787 (0.067)
          Brier_Score: 0.191 (0.025)

2026-02-26 20:26:58  Done in 23.24 seconds. [train]

Plot ROC curve:

plot_roc(
  hospitalized48_glmnet,
  main = "GLMNET"
)

Plot variable importance:

plot_varimp(
  hospitalized48_glmnet,
  show_top = 11L
)

CART

hospitalized48_cart <- train(
  datp,
  algorithm = "cart",
  outer_resampling_config = setup_Resampler(seed = 650)
)
2026-02-26 20:26:58  [train]
2026-02-26 20:26:58 Training set: 382 cases x 6 features. [summarize_supervised]
2026-02-26 20:26:58 // Max workers: 7 => Algorithm: 1; Tuning: 1; Outer Resampling: 7 [get_n_workers]
2026-02-26 20:26:58 <> Training CART Classification using 10 independent folds... [train]
2026-02-26 20:26:58 Input contains more than one column; stratifying on last. [resample]
2026-02-26 20:26:58 Using max n bins possible = 2. [kfold]
2026-02-26 20:26:58 </> Outer resampling done. [train]
<Resampled Classification Model>
CART (Classification and Regression Trees)
⟳ Tested using 10 independent folds.

  <Resampled Classification Training Metrics>
    Showing mean (sd) across resamples.
          Sensitivity: 0.944 (0.020)
          Specificity: 0.781 (0.032)
    Balanced_Accuracy: 0.863 (0.010)
                  PPV: 0.817 (0.020)
                  NPV: 0.932 (0.019)
                   F1: 0.876 (0.008)
             Accuracy: 0.864 (0.010)
                  AUC: 0.903 (0.020)
          Brier_Score: 0.106 (0.007)

  <Resampled Classification Test Metrics>
    Showing mean (sd) across resamples.
          Sensitivity: 0.824 (0.063)
          Specificity: 0.659 (0.123)
    Balanced_Accuracy: 0.742 (0.072)
                  PPV: 0.720 (0.076)
                  NPV: 0.784 (0.069)
                   F1: 0.767 (0.060)
             Accuracy: 0.744 (0.071)
                  AUC: 0.761 (0.089)
          Brier_Score: 0.204 (0.055)

2026-02-26 20:26:58  Done in 0.20 seconds. [train]
plot_roc(
  hospitalized48_cart,
  main = "CART"
)
plot_varimp(
  hospitalized48_cart
)

LightRF

hospitalized48_lightrf <- train(
  datp,
  algorithm = "lightrf",
  outer_resampling_config = setup_Resampler(seed = 650)
)
2026-02-26 20:26:59  [train]
2026-02-26 20:26:59 Training set: 382 cases x 6 features. [summarize_supervised]
2026-02-26 20:26:59 // Max workers: 7 => Algorithm: 7; Tuning: 1; Outer Resampling: 1 [get_n_workers]
2026-02-26 20:26:59 <> Training LightRF Classification using 10 independent folds... [train]
2026-02-26 20:26:59 Input contains more than one column; stratifying on last. [resample]
2026-02-26 20:26:59 Using max n bins possible = 2. [kfold]
⠙ 5/10 ETA:  2s | Training outer resamples... 
2026-02-26 20:27:03 </> Outer resampling done. [train]
<Resampled Classification Model>
LightRF (LightGBM Random Forest)
⟳ Tested using 10 independent folds.

  <Resampled Classification Training Metrics>
    Showing mean (sd) across resamples.
          Sensitivity: 0.805 (0.012)
          Specificity: 0.665 (0.023)
    Balanced_Accuracy: 0.735 (0.015)
                  PPV: 0.713 (0.015)
                  NPV: 0.767 (0.015)
                   F1: 0.756 (0.012)
             Accuracy: 0.736 (0.015)
                  AUC: 0.783 (0.006)
          Brier_Score: 0.220 (1.3e-03)

  <Resampled Classification Test Metrics>
    Showing mean (sd) across resamples.
          Sensitivity: 0.783 (0.097)
          Specificity: 0.644 (0.105)
    Balanced_Accuracy: 0.714 (0.065)
                  PPV: 0.698 (0.066)
                  NPV: 0.750 (0.097)
                   F1: 0.735 (0.060)
             Accuracy: 0.714 (0.065)
                  AUC: 0.732 (0.061)
          Brier_Score: 0.227 (0.008)

2026-02-26 20:27:03  Done in 4.67 seconds. [train]
plot_roc(
  hospitalized48_lightrf,
  main = "LightRF"
)
plot_varimp(
  hospitalized48_lightrf
)

LightGBM

hospitalized48_lightgbm <- train(
  datp,
  hyperparameters = setup_LightGBM(
    learning_rate = c(0.001, 0.01)
  ),
  outer_resampling_config = setup_Resampler(seed = 650)
)
2026-02-26 20:27:03  [train]
2026-02-26 20:27:03 Training set: 382 cases x 6 features. [summarize_supervised]
2026-02-26 20:27:03 // Max workers: 7 => Algorithm: 7; Tuning: 1; Outer Resampling: 1 [get_n_workers]
2026-02-26 20:27:03 <> Training LightGBM Classification using 10 independent folds... [train]
2026-02-26 20:27:03 Input contains more than one column; stratifying on last. [resample]
2026-02-26 20:27:03 Using max n bins possible = 2. [kfold]
⠙ 4/10 ETA:  6s | Tuning... (10 combinations) 
⠹ 9/10 ETA:  1s | Tuning... (10 combinations) 
⠙ 3/10 ETA:  7s | Tuning... (10 combinations) 
⠙ 2/10 ETA: 50s | Training outer resamples... 
⠙ 3/10 ETA:  7s | Tuning... (10 combinations) 
⠹ 9/10 ETA:  1s | Tuning... (10 combinations) 
⠙ 2/10 ETA: 50s | Training outer resamples... 
⠙ 3/10 ETA:  6s | Tuning... (10 combinations) 
⠹ 9/10 ETA:  1s | Tuning... (10 combinations) 
⠙ 2/10 ETA: 50s | Training outer resamples... 
⠙ 3/10 ETA:  6s | Tuning... (10 combinations) 
⠙ 2/10 ETA: 50s | Training outer resamples... 
⠙ 4/10 ETA:  5s | Tuning... (10 combinations) 
⠙ 2/10 ETA: 50s | Training outer resamples... 
⠸ 6/10 ETA: 25s | Training outer resamples... 
⠙ 3/10 ETA:  6s | Tuning... (10 combinations) 
⠹ 9/10 ETA:  1s | Tuning... (10 combinations) 
⠸ 6/10 ETA: 25s | Training outer resamples... 
⠙ 5/10 ETA:  5s | Tuning... (10 combinations) 
⠸ 6/10 ETA: 25s | Training outer resamples... 
⠙ 5/10 ETA:  5s | Tuning... (10 combinations) 
⠸ 6/10 ETA: 25s | Training outer resamples... 
⠙ 5/10 ETA:  4s | Tuning... (10 combinations) 
⠸ 6/10 ETA: 25s | Training outer resamples... 
2026-02-26 20:28:05 </> Outer resampling done. [train]
<Resampled Classification Model>
LightGBM (Gradient Boosting)
⚙ Tuned using exhaustive grid search.
⟳ Tested using 10 independent folds.

  <Resampled Classification Training Metrics>
    Showing mean (sd) across resamples.
          Sensitivity: 0.832 (0.023)
          Specificity: 0.741 (0.020)
    Balanced_Accuracy: 0.787 (0.017)
                  PPV: 0.768 (0.015)
                  NPV: 0.811 (0.023)
                   F1: 0.799 (0.017)
             Accuracy: 0.787 (0.017)
                  AUC: 0.871 (0.018)
          Brier_Score: 0.155 (0.016)

  <Resampled Classification Test Metrics>
    Showing mean (sd) across resamples.
          Sensitivity: 0.778 (0.086)
          Specificity: 0.660 (0.077)
    Balanced_Accuracy: 0.719 (0.061)
                  PPV: 0.704 (0.055)
                  NPV: 0.747 (0.084)
                   F1: 0.737 (0.060)
             Accuracy: 0.720 (0.061)
                  AUC: 0.792 (0.064)
          Brier_Score: 0.188 (0.031)

2026-02-26 20:28:05  Done in 1.00 minutes. [train]
plot_roc(
  hospitalized48_lightgbm,
  main = "LIghtGBM"
)
plot_varimp(
  hospitalized48_lightgbm
)

Present Results

present(
  list(
    hospitalized48_glmnet,
    hospitalized48_cart,
    hospitalized48_lightrf,
    hospitalized48_lightgbm
  )
)
Elastic Net (GLMNET), Classification and Regression Trees (CART), LightGBM Random Forest (LightRF), and Gradient Boosting (LightGBM) were used for Classification.
The top-performing model was CART with a test-set Balanced Accuracy of 0.742, followed by GLMNET, LightGBM, and LightRF with Balanced_Accuracy of 0.731, 0.719, and 0.714 respectively.